Chapter 5 Results

Before the analysis, we want to show the current situation of spread of COVID-19 in the United States first. All the graphs below are using the data collected on 2021-12-09.

The map below shows the distribution of confirmed cases in the United States. It indicates the cumulative confirmed cases in United States on 12/9/2021. Each red point represents the number of confirmed cases in a city, and the size of the point reflects the relative amount of the exact number. The higher the number, the larger the spot. Form this map, we can see that larger cities are having a larger number of cases.

To be more clear, we use the bar chart to show the current cumulative confirmed cases and number of death in 50 states. By comparing these two graphs, it shows that state with more confirmed cases generally has a higher number of death. Form these two graphs, we can see that California, Texas, Florida, New York, and Illinois are top 5 states with most confirmed cases. California, Texas, Florida, New York, and Pennsylvania are top 5 states with most number of deaths.

5.1 Impact of the vaccination on the spread of the epidemic

To show the impact of the vaccination and the spread of the epidemic, as there are 12 features in our vaccination data, we first want to find which features have the strong correlation with the number of cases. Therefore, we use a correlation heatmap to select the most relevant features to confirmed cases and deaths.

From the graph above, we can see the top 4 features with highest correlations are total vaccinations, total distributed, people vaccinated, and people fully vaccinated. We will use people vaccinated and people fully vaccinated as two features to represent vaccination. People vaccinated includes both one does vaccination and fully vaccinated data. And fully vaccinated feature only represents the number of people who are fully vaccinated. Below are the bar charts for these two features in 50 states.

Form the chart for People vaccinated, it indicates that California, Texas, Florida, New York, and Pennsylvania are top 5 states with most number of vaccination. Which is corresponding to the top 5 states with most number of deaths. This may because the population of these five states are large, which will cause them to become the five states with the highest number of both vaccination and deaths. However, none of them are on top of the list of people fully vaccinated. So it may indicate that one does vaccination can not effectively help to stop the spread of the epidemic, but fully vaccination can achieve this.

To show the relationship between people vaccinated, people fully vaccinated, number of cases and number of death more clearly, we plot the time series graphs for each features for the top 5 states mentioned before, which are California, Texas, Florida, New York, and Pennsylvania.

In the original graph, there is a significant outlier in people_vaccinated daily growth graph. That is because from October 2nd to November 28th, the people_vaccinated information in Pennsylvania is missing. Although we tried to fill it by previous data, there still exists a unreasonable huge pike. Therefore, we delete the data of that time period. But after the cleaning of data, we can see in the graph that as the number of vaccinations is in an increasing trend, the number of confirmed cases and death will growth in a decreasing rate. And when the number of vaccinations is keeping in a low number, although in the short run, the number of confirmed cases and death will also keep in a low increasing rate, in the long run, it will increase and even achieve a highest point.

To show this trend more clearly, instead of only focusing on the data of these top five state, we will use the data for the United States. The graph below shows the time series of confirmed cases growth and people vaccinated growth in the United States.

Form the graph, we can say the similar trend as before. The trend from Jan 2021 to April 2021 shows clearly that when people vaccinated number grew in a high speed, confirmed cases growth speed decreased. For the time period between April 2021 and July 2021, although the number of confirmed cases still decreased as the vaccinated number decreased, it was in a lower decreasing rate compared with the previous time range. This trend was proved in the later time range which is after July 2021. As the number of growth in vaccination was keeping in a low value, the number of confirmed cases turned to increase and reached its highest point in about September 2021. Therefore, this plot confirms that the increase in the number of vaccination will lead to a decrease in the number of confirmed cases, thus, it helps to prevent the COVID-19 spread.

5.2 Impact of the hospital capacity and inpatient rate on the death rate

In addition to the relationship between vaccines and COVID-19, we also want to see whether the capacity of the hospital and the inpatient rate will have a certain impact on the mortality rate of COVID.

We first plot a scatter plot to show the relationship between capacity of the hospital and the death rate. Because states with more populations will have more hospitals, in order to avoid the impact of different population in each state, we chose to divide the total number of beds in each state by the total population in each state to get the percentage of number of beds in population.

According to the graph, there is no clear trend between the percentage of beds in population and the death rate. Especially when we ignore the state which has more than 0.035. It’s different from what we initially imagined, when capacity of hospital in a state increase, the mortality rate will drop. It may cause by several reasons. The first reason may be due to inaccurate mortality. Because the symptoms of COVID-19 are similar to those of flu, many people will mistakenly think that they are flu instead of COVID-19, leading to deviations in statistical data. The second possibility is that there are other factors, such as the different in the number of ventilators or the different isolation rules in each state. These are also factors that can affect mortality. Therefore, although from this graph, there is no obvious relationship between the hospital capacity and death rate, we can not give a definite conclusion.

We then plot a scatter plot to show the relationship between hospitalization rate and the death rate.

Form the above graph, we can see that death rate increase as the hospitalization rate increases. There are two possible explanations. The first one is people would not go to hospital unless they are in very bad situation. Thus as the number of impatient increase, death rate increase. The second explanation is consider about the time range. There is no very useful way to cure COVID-19 patient in that time period. Thus there is a positive relationship between inpatient number and death rate.

5.3 Impact of the

We divide the countries in the data by each continent. Each country has only one life expectancy, so we draw the total cases per million of each country in each continent according to each life expectancy. We can see that the countries with higher life expectancy have more total cases per million.

## # A tibble: 193 x 2
##    location            `total_cases[length(total_cases)]`
##  * <chr>                                            <dbl>
##  1 Afghanistan                                     157508
##  2 Albania                                         202295
##  3 Algeria                                         211859
##  4 Andorra                                          18815
##  5 Angola                                           65301
##  6 Antigua and Barbuda                               4148
##  7 Argentina                                      5346242
##  8 Armenia                                         341058
##  9 Australia                                       222260
## 10 Austria                                        1207336
## # ... with 183 more rows

This graph can help us make an analysis of the progress of total cases per million for the following countries: China, India, Sweden, Russia, United Kingdom, United States. The flattening curve means slowing down the rate of infections. The cumulative curve can help us see the cumulative change of infection cases every day. This can help hospitals manage the process of cases. It can help the hospital prepare and distribute wards, medical equipment, and medical supplies.

These two graph are able to help us make an analysis of the progress of number of new cases and the progress of number of new cases per million divided by location for the following countries: China, India, Sweden, Russia, United Kingdom, United States. These two graphs show the changes in the number of new infections per day. We can see the number of new infection cases in each country in each time period and the comparison of the number of new infection cases in each country in the same time period. We can see a sudden increase in the number of new infections in India in a few days, and the United States has maintained the number of new infections for a period of time.

Life expectancy reflects a country’s development level, or the proportion of the elderly population. The longer a country’s life expectancy, the greater the proportion of the elderly population in the total population. Therefore, we analyzed the relationship between the number of confirmed cases of covid-19 and life expectancy. According to the data and statistical analysis provided by who, we can see that there is a positive correlation between life expectancy and confirmed cases of coronavirus.

## 
##  One Sample t-test
## 
## data:  data_relation$total_cases_per_million
## t = 21.06, df = 2280, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  10398.08 12533.34
## sample estimates:
## mean of x 
##  11465.71
## 
##  One Sample t-test
## 
## data:  data_relation$life_expectancy
## t = 880.18, df = 2280, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  71.84826 72.16912
## sample estimates:
## mean of x 
##  72.00869
## 
##  Pearson's product-moment correlation
## 
## data:  data_relation$total_cases_per_million and data_relation$life_expectancy
## t = 23.453, df = 2279, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4072702 0.4734161
## sample estimates:
##       cor 
## 0.4409417